Molecular basis of C9orf72 poly-PR interference with the β-karyopherin family of nuclear transport receptors

Nucleocytoplasmic transport (NCT) is affected in several neurodegenerative diseases including C9orf72-ALS. It has recently been found that arginine-containing dipeptide repeat proteins (R-DPRs), translated from C9orf72 repeat expansions, directly bind to several importins. To gain insight into how this can affect nucleocytoplasmic transport, we use coarse-grained molecular dynamics simulations to study the molecular interaction of poly-PR, the most toxic DPR, with several Kapβs (importins and exportins). We show that poly-PR–Kapβ binding depends on the net charge per residue (NCPR) of the Kapβ, salt concentration of the solvent, and poly-PR length. Poly-PR makes contact with the inner surface of most importins, which strongly interferes with Kapβ binding to cargo-NLS, IBB, and RanGTP in a poly-PR length-dependent manner. Longer poly-PRs at higher concentrations are also able to make contact with the outer surface of importins that contain several binding sites to FG-Nups. We also show that poly-PR binds to exportins, especially at lower salt concentrations, interacting with several RanGTP and FG-Nup binding sites. Overall, our results suggest that poly-PR might cause length-dependent defects in cargo loading, cargo release, Kapβ transport and Ran gradient across the nuclear envelope.

The 1BPA force field has been previously used to study intrinsically disordered FG-Nups and dipeptide repeat proteins (DPRs) [3,4]. The bonded interactions, i.e. the bending and torsion potentials, in this force field are residue and sequence specific. The attractive hydrophobic and repulsive hydrophilic interactions between different residues in this force field are represented by:  Table S1 [2]. The hydrophobic strength values of charged residues are slightly increased in line with our recent work [4].
The electrostatic interactions between charged residues are described by the modified Coulomb

Poly-PR-Kap interaction
Poly-PR has been shown to bind to several importins in in vitro experiments [5]. However, no binding has been observed for the more hydrophobic DPRs, i.e. poly-GA and poly-GP [5].
These observations suggest the importance of Arginine in driving the binding between poly-PR and the Kaps. At physiological salt concentrations, Arginine mainly engages in electrostatic and cation-pi interactions. For the poly-PR-Kap interaction, we use the same electrostatic potential ( elec ) as described in the previous section. To take into account the cation-pi interactions between Arginine (in poly-PR) and the aromatic residues Phenylalanine, Tyrosine and Tryptophan (in Kap), we use an 8-6 Lennard-Jones (LJ) potential that replaces hp for the RF, RY, and RW interactions: cp, ( ) = cp, [3 ( m ) 8 − 4 ( m ) 6 ], where cp, is a pair-dependent cation-pi energy. The parameter m , which is the distance at which the cp, reaches its minimum value, is set to 0.45 nm. This value is the weighted average distance between the guanidinium group of Arginine and an aromatic ring at different orientations (Planar, Oblique, Orthogonal) [6]. This value of m also lies in the range used to find cation-pi structures involving both Arginine and Lysine in the Protein Data Bank (PDB) [7].
The Arginine interaction energy cp, with the aromatic side chains of Phenylalanine, Tyrosine, and Tryptophan varies between different pairs [6][7][8][9]. In the present study we set the RY cationpi energy as a basis for calculating the cation-pi energies for the other combinations using PDB statistics. According to all-atom free energy calculations, the RY interaction energy is comparable to the strongest interaction between different non-charged residues at physiological salt concentrations [10]. Therefore, in order for the cation-pi interactions to be compatible with the 1BPA force field, we set cp,RY = 5 kJ/mol which is similar to the deepest potential well in the 1BPA force field (5.2 kJ/mol).
To estimate the energy difference between RY and the other combinations, similar to [11,12], we use the PDB cation-pi contact frequencies in an aqueous environment [7]. Based on the frequencies of individual residues as well as the frequencies of cation-pi pairs within a large dataset of proteins, see Table S2 taken from [7], the energy differences between different cationpi pairs can be estimated using a simple formulation of statistical potential [11,13] For the hydrophilic/hydrophobic interactions between poly-PR and the rest of the Kap residues (the grey residues in figure 1a), we use hp with = 10 kJ/mol which leads to an excluded volume potential that vanishes at = 0.6 nm.

Developing 1BPA models of Kaps from the crystal structures
To develop 1BPA coarse-grained (CG) models of the Kaps we use the crystal structures listed in Table S3. For all the Kaps listed in this table, except Imp1 and CRM1, the crystal structure of the unbound state is available. In cases where more than one crystal structure is available, we use the one that has a higher resolution. For Imp1 (876 residues) and CRM1 (1071 residues), we use Robetta [14] to obtain the crystal structures. Due to the limitation for the sequence length in Robetta, for CRM1, we obtain the structure for residues 72-1071 that includes the C-extension domain which has been shown to play important roles in cargo loading inhibition in the absence of RanGTP [15]. The CG models of Kaps are built by considering beads at the position of -carbons in the crystal structures and introducing a network of stiff harmonic bonds that maintains the secondary and tertiary structure of the NTRs. This network of bonds is represented by the harmonic potential network = ( − ) 2 , where is 8000 kJ/mol/nm 2 and b is the original distance between the amino acid beads in the crystal structure.
A bond is made between the beads if is less than 1.4 nm.
There are missing regions in the crystal structure of some Kaps. Some of these missing regions contain tracts of negatively-charged residues that might play a role in the interaction of the Kaps with poly-PR. The missing regions in X-ray crystallography are known to be more flexible and more disordered than the observed regions. Here we used PSIPRED to predict the secondary structure of the missing regions [16], see Figure S1. The results show that almost all the missing regions (except a 6-residue-long missing region in a B-helix of KAP121) have more than 50% of their residues in a coil conformation. These regions are added to the CG models and considered to be disordered. For the interactions between the residues within these regions we use the 1BPA force field featuring hp and elec as described above.

Analyzing the interaction between poly-PR and Kaps
To analyze the binding between poly-PR and the Kaps, we calculate the time-averaged number of contacts and the binding probability. The contact probability for each Kap residue in figure 3a and S7 is the probability of having at least one poly-PR residue within 1 nm proximity of the Kap residue. Similar to the definition of the binding probability, we calculate the contact probability for each Kap residue at equilibrium by dividing the number of frames for which this contact criterion is satisfied, by the total number of frames. Residue is considered to be a contact site if the contact probability for this residue is larger than 0.10. contact is the number of Kap residues that satisfy this criterion, and shared is the number of Kap residues that make contact with poly-PR (obtained in our simulations) and at the same time are known for recognition of native binding partners of Kaps (i.e., NLS/NES-cargo, IBB domain, RanGTP, and FG-Nups, obtained using PiSITE [17], see section 4 for more details).

Estimating the amino acid sequence of A-and B-helices
To obtain an estimation of the amino acid sequences for the A-and B-helices for each Kap, we first use the STRIDE secondary structure prediction algorithm in VMD [18] to find all the -helices. We then exclude the small helices that usually have smaller than 6-9 residues and consider them to be part of the linkers. In most cases, these small helices are located between two coil regions inside the linkers. However, in a few cases, these helices are connected to larger helices without a coil region in between. For these cases, a helical twist can be seen where the two helices are connected. The approximate location of the twist is found visually and is further checked by obtaining the backbone dihedral angles and for the residues around the twist (see Figure S4). At the location of the twist, the angle changes sign. For Imp1 and KAP95 we also exclude longer -helices which contain 13 and 14 residues in the linkers between HEAT repeats 2 and 3. After deleting the -helices inside the linkers, the number of remaining helices is equal to twice the number of HEAT repeats reported in previous studies, see Table S3 for the information about the HEAT repeats. We exclude KAP120 from our analysis because the number of HEAT repeats has not been reported. We also take into account the exceptions mentioned in the literature, see the comments in Table S4 for Imp1, TNPO3, and XPO5. A-and B-helices are highlighted in the crystal structures presented in Table S4.

Using PiSITE to obtain the Kap binding sites
We use PiSITE to find residues of Kaps that interact with protein cargoes, IBB domains,         Table S3 for more details. For importins, residues that bind to NLS-cargo and IBB domains, and for exportins residues that bind to NES-cargo, are shown with vertical black lines. The residues that bind to RanGTP and FG-Nups are shown with vertical green and orange lines respectively. The group RanGTP contains binding residues for both RanGTP and

Figure S8
The number of residues in each region of the Kaps that make contact with poly-PR, contact at salt = 100 mM, plotted for different concentrations of PR7, PR20, PR35, and PR50. is the number of poly-PR molecules.

Figure S9
The number of Kap residues that make contact with poly-PR, contact , at salt = 100 mM plotted against the number of PR7 molecules, .     * Among the five structures predicted by Robetta, we use the one that has the lowest prediction error.
** For the following cases we add the binding residues suggested in the literature to the list of binding residues obtained from PiSITE. These cases are Imp1 (PDB code 1m5n [19]), TNPO1 (PDB codes 2z5n [20], 4fdd [21], 5j3v [22]), and TNPO3 (PDB code 6gx9 [23]). These crystal structures are not analyzed by PiSITE. This update only adds a few new binding residues (< 5 residues for each Kap) to the list of binding residues obtained from PiSITE.

Supplementary movies
Movie S1 Binding of one copy of PR7 to Imp1. PR7 is in red and Imp1 is in light grey.
Movie S2 Binding of one copy of PR50 to Imp1. PR50 is in red and Imp1 is in light grey.

Movie S3
Binding of three copies of PR35 to KAP95. PR35 molecules are in red. A-and Bhelices of KAP95 are in light blue and yellow, respectively. Linkers are shown with transparent beads.